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1. Introduction 



Lattice simulations with a relativistic ft-quark will become feasible with the next generation of 
super computers [|l],|2L]3]|. At least in the quenched approximation, these machines should allow for 
lattice sizes L/a = 0(100) so that the conditions amt < 1 and L ~ 1 — 2fm can be fulfilled at the 
same time (see [Q, ||] for studies approaching this regime). Among other things, such computations 
will provide clear tests of effective field theories like HQET [^] or of alternative formulations for 
relativistic heavy quarks on the lattice [Q, §]. Depending on the outcome, further evidence can 
be produced supporting the use of these theories, especially in view of computations with light 
dynamical flavors, where lattice sizes suitable for the inclusion of relativistic (quenched) heavy 
quarks are far to come. 

On the numerical side simulations in that regime may be affected by round-off errors. The lat- 
tice quark propagator is usually computed numerically by employing CG-type algorithms to solve 
the system of linear equations D\j/ = r\ for i/a, where the matrix D is some discrete representation 
of the Euclidean Dirac operator. Defining the time slice norm of a Dirac vector y{t,x) on the time 
slice t as 



\v\t = JL (VS&0)*VS&0, ( L1 ) 



(greek indices represent spin, roman indices colour) we argue that in situations where 

min|y/|r 

: arithmetic precision, (1.2) 



max \\jf\ t 



the solver residual 



becomes an un-reliable indicator for convergence and the solver itself fails to produce sensible 
results. Both effects are due to accumulated round-off errors. 

We propose an algorithm with the potential to overcome these problems. We suggest decom- 
posing the time direction of the lattice into a sufficient number of adjacent or overlapping domains 



to avoiding the situation in eq. (1.2) within each domain. By applying the Schwarz alternating 
procedure []|, 10] to these domains, we are able to recursively construct the solution over the whole 
lattice in a controlled way. 

We first present a numerical example for the case of free Wilson fermions in the 4-dimensional 



QCD Schrodinger Functional [ |1 1| ] where a CG-type solver produces unreliable results although 
proper convergence is indicated by the residual in eq. ( \l.3\ >. We then review the proposed algorithm 
and demonstrate its efficiency considering as an example the 1 -dimensional Dirac equation. We 
compare our numerical results to an exact reference solution for the propagator in this model, 
computed with Mathematica. The term exact refers to the fact that within this framework one 
can vary the arithmetic precision even beyond double precision and thereby gain confidence in the 
numerical solution. As reported in [|L2|, Il3l 111], the Schwarz alternating procedure can of course 



also be applied to the fully interacting 4-dimensional theory. 
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Figure 1: BiCGstab solver residual r for the solution of the free Wilson Dirac propagator on a 12 3 x T-lattice 
with T = {60, 70, 80, 90, 100, 110} after convergence (triangles) and after having changed the solution on the 
last 10 time slices (squares) and on the last 20 time slices (circles) by +10% and +100% respectively. 



2. The problem in 4 dimensions 

We illustrate the problem with a numerical study in the QCD Schrodinger Functional using the 
double precision version of the MILC code [[131] ■ The parameters were am ~ 0.5 (fc = 0.111111), 
L/a = 12, and T/a = {60,70,80,90, 100, 110} (a = 1 in the following) and we used a unit gauge 



background. Using the stabilised bi-conjugate gradient algorithm (BiCGstab) [16], we solved 
D\j/ = v\ for which corresponds to a column of the quark propagator. As is common practise, 
we used r < 10~ 15 for the stopping criterion. 

As a test of the solver residual and of proper convergence for large t, we changed the solution 
yr{t,x) by +10% (or +100%), once for t > T - 10 and once for t > T - 20 for all the values of T 
mentioned above and then recomputed r. Figure [I] shows the results. The triangles represent the 
achieved solver residual for each choice of T and the squares and circles represent the residual r 
after having changed the solution for t > T — 10 and t >T — 20, respectively. We see that above 
a certain time-extent T of O(100) the residual ceases to be sensitive to a change of the solution by 
10% (or 100%). In this situation the computed solution y cannot be considered correct. 



3. The 1 -dimensional Dirac operator 

The problem we observed in 4 dimensions is also present in 1 -dimension. To illustrate this we 
implemented the equation Dy = tj in MATLAB with D given by the 1-dimensional free Wilson 
lattice Dirac operator with periodic boundary conditions, 

D$ = <5 A> , - k [8 y , x+ 1 ( 1 + 7i ) + 1 (1 - Ti )] , (3-1) 

where K=l/ (2m + 2) is the hopping parameter. We solved for y using a stabilised BiCG algorithm 
in double precision. Since we implemented the Dirac equation on the torus, we expect the problem 
to appear at around twice the time extent observed in the Schrodinger functional. Indeed, for 
T = 180, m = 0.5 and r = 10~ 15 the solution vector varies in time by more orders of magnitude 
than can be represented by the arithmetic precision and vanishes exactly for the 10 central lattice 
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Figure 2: Domain of size L\ = 6. The black dots represent the bulk of the domain A, the interior boundary 
dA* is represented by the half-filled dots and the exterior boundary dA is represented by the grey circles. 
The complement of A is A*. 

points. One therefore expects the solution to be wrong for a time interval larger than 10 time slices 
around T /2. 

4. The (multiplicative) Schwarz alternating procedure (SAP) 

We now give our proposal to circumvent the problem. We decompose the time-direction of 
the lattice into a number of domains, such that the solution is expected to decay by fewer orders of 
magnitude than covered by the arithmetic precision within each domain. 

We adopt the notation given by Liischer [|12|] and briefly review some basic definitions. We de- 
compose the problem into « dom non-overlapping domains. Each domain A has an interior boundary 
d A* and an exterior boundary dA (cf. figure Q). The position-space Dirac operator may be written 
in the form 

D=( Da ° dA ), (4.1) 



D dA * D 



A" 



where the matrices Da and Da* act on the domain A and its complement A* , respectively. The off- 
diagonal matrices D^a and D^ A * contain those interactions that couple A to the adjacent domains. 



Following [12], the algorithm we propose visits each of the n^om domains in successive sweeps 



and updates the current approximation y to the solution of Dy = r\ according to 

y' =y + D K \r}-Dy). (4.2) 
Here we take y = as the initial guess. We introduce the domain based stopping criterion 1 



''do 



maxj^|D^-T7| A /|vA| A j < 1(T 15 . (4.3) 

Note that we normalise with respect to the solution vector since one usually uses a 8-f unction as 
source 77. 

5. Numerical results 

We first computed y on the whole lattice in 1 dimension by means of Fourier transformation in 
Mathematica. Since this software allows for arbitrary numerical precision we could thereby obtain 
an exact reference solution. We then implemented the SAP solver in MATLAB, where all further 
numerical tests have been performed. Within each sub-domain the current update for the solution y 



Here I ■ I a is the Dirac norm restricted to the domain A. 
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Figure 3: Left: Example of solver accuracy for Hdom = 3,m = 0.5 and T = 120 (vertical lines indicate the 
domain-borders). The blue lines (red circles) represent the relative error of the result obtained by the BiCG 
(SAP) solver with respect to the exact solution. Right: Time-slice residual for the problem 1 (rj —D\f/)^d 
after the 8th sweep of the SAP solver over the 1st domain (T — 120). 



is computed using a BiCG solver which runs until convergence. As an example we discuss the case 
T = 120 and m = 0.5. The SAP solver takes about 10 times more iterations than the conventional 



(unreliable) BiCG with a global stopping criterion like in eq. (|1.3|). Notice that the matrix x vector 
operations needed in the SAP solver clearly involve smaller matrices. However, for heavy quarks 
the condition number of the Dirac matrix is not very large and the main issue is rather the precision. 
The results for both algorithms are illustrated in figure ^] in terms of the time-slice relative error 
with respect to the exact solution. Two comments are in order : 

• Without domain decomposition the solution y deviates strongly from the exact solution de- 
spite alleged proper convergence of the solver indicated by the residual r. 

• The circles in fig. || show the relative error for the solution computed with the method 
suggested in this work. Notably it stays at the desired level over the whole time extent of the 
lattice, indicating uniform convergence. 

6. Conclusions 

We have given numerical evidence that conventional CG-solvers run into round-off problems 
when the lattice volume and the quark mass are large. In particular the solver residual r given in 



eq. ( |1.3| ) is misleading in these cases. We have performed preliminary tests of an algorithm and a 
residual based on the Schwarz alternating procedure that do not suffer from these problems. 

In contrast to the conventional solver, the algorithm we suggest converges to a constant preci- 
sion over the whole lattice. Still one might expect the local residual to grow within each domain. 
This is indeed visible when the problem D/^{r\ —Dyf) = 5 is considered (see figure ||, right plot). 

The parameter range where the algorithm applies is complementary to that for conventional 
CG-solvers for small quark masses on the one hand and to that for the procedure based on the 



hopping parameter expansion for very large quark masses [17] on the other hand. The algorithm 
suggested here should in fact allow a more precise assessment of the range of quark masses where 
the latter method is applicable. 
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The implementation in 4 dimensions should be straightforward. In QCD a gain in performance 
could presumably be achieved by starting with the free propagator as initial guess. It would be very 
interesting to investigate the influence of overlapping domains on the solver performance. 
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